Associating a web session with a household member

ABSTRACT

A method for associating a web session with a particular member of a group of users includes: receiving a plurality of training web sessions, each training web session including one or more web events generated by a respective known user having one or more demographic attributes; training one or more binary classifiers using the training web sessions and the demographic attributes of the users; receiving a plurality of target web sessions, each target web session including one or more web events that are generated by a respective unknown member of a group of users, wherein each user has one or more demographic attributes; and applying one or more of the binary classifiers to the target web sessions such that a respective target web session is uniquely associated with a member based on, at least in part, the demographic attributes of the member.

TECHNICAL FIELD

The disclosed implementations relate generally to tracking a user'sactivities on the Internet, and in particular, to system and method forassociating a web session from a household with a particular member ofthat household that has expressly agreed to have its web browsingactivities being surveyed.

BACKGROUND

Nowadays, people at home spend more and more time on the Internet fordifferent purposes, such as checking news and other information, on-lineshopping, exchanging information via email or social networking sites,enjoying entertainments such as video or audio clips, etc. A householdtypically has multiple Internet-accessible devices, such as PCs,smartphones, tablets, game consoles and televisions. Although it ispossible to keep track of all the web browsing activities (also known asweb events) originated from a particular household, it is difficult toassociate a particular web event (e.g., a visit to a particular websiteduring particular time period) with a particular member of the householdin a less intrusive manner, which is especially true if the householdincludes multiple household members that may use different devices toaccess the Internet at the same time.

SUMMARY

In accordance with some implementations described below, acomputer-implemented method for associating a web session with aparticular member of a group of users is implemented at a computersystem having one or more processors and memory. The method includes:receiving a plurality of training web sessions, each training websession including one or more web events generated by a respective knownuser having one or more demographic attributes; training one or morebinary classifiers using the plurality of training web sessions and thedemographic attributes of the associated users; receiving a plurality oftarget web sessions, each target web session including one or more webevents that are generated by a respective unknown member of a group ofusers, wherein each user has one or more demographic attributes; andapplying one or more of the binary classifiers to the plurality oftarget web sessions such that a respective target web session isuniquely associated with a member of the group of users based on, atleast in part, the demographic attributes of the member of the group ofusers.

In accordance with some implementations described below, a computersystem for associating a web session with a particular member of a groupof users is disclosed, the computer system including one or moreprocessors and memory storing one or more programs to be executed by theone or more processors. The one or more programs include instructionsfor: receiving a plurality of training web sessions, each training websession including one or more web events generated by a respective knownuser having one or more demographic attributes; training one or morebinary classifiers using the plurality of training web sessions and thedemographic attributes of the associated users; receiving a plurality oftarget web sessions, each target web session including one or more webevents that are generated by a respective unknown member of a group ofusers, wherein each user has one or more demographic attributes; andapplying one or more of the binary classifiers to the plurality oftarget web sessions such that a respective target web session isuniquely associated with a member of the group of users based on, atleast in part, the demographic attributes of the member of the group ofusers.

In accordance with some implementations described below, anon-transitory computer readable-storage medium storing one or moreprograms for associating a web session with a particular member of agroup of users is disclosed. The one or more programs includeinstructions for: receiving a plurality of training web sessions, eachtraining web session including one or more web events generated by arespective known user having one or more demographic attributes;training one or more binary classifiers using the plurality of trainingweb sessions and the demographic attributes of the associated users;receiving a plurality of target web sessions, each target web sessionincluding one or more web events that are generated by a respectiveunknown member of a group of users, wherein each user has one or moredemographic attributes; and applying one or more of the binaryclassifiers to the plurality of target web sessions such that arespective target web session is uniquely associated with a member ofthe group of users based on, at least in part, the demographicattributes of the member of the group of users.

In accordance with some implementations described below, acomputer-implemented method for defining a demographic profile forvisitors of a website is implemented at a computer system having one ormore processors and memory. The method includes: receiving a pluralityof web sessions, wherein each web session includes at least one webevent associated with a particular website, which is generated by arespective unknown member of a group of users, each user having one ormore demographic attributes; applying one or more demographic binaryclassifiers to the plurality of web sessions such that each web sessionis uniquely associated with a member of the group of users based on, atleast in part, the demographic attributes of the member of the group ofusers; and defining a demographic profile for visitors of the particularwebsite based on an aggregation of the demographic attributes associatedwith respective members of the group of users.

BRIEF DESCRIPTION OF DRAWINGS

The aforementioned implementation of the invention as well as additionalimplementations will be more clearly understood as a result of thefollowing detailed description of the various aspects of the inventionwhen taken in conjunction with the drawings. Like reference numeralsrefer to corresponding parts throughout the several views of thedrawings.

FIG. 1A is a block diagram illustrating a client-server computer systemincluding a plurality of client devices at different households thatcommunicate with a server system that is responsible for associating aweb event from a household with a particular member of the household inaccordance with some implementations.

FIG. 1B is a block diagram illustrating the components of a featureextractor in the server system in accordance with some implementations.

FIG. 2A is a block diagram illustrating a data structure used formanaging session-based web logs in accordance with some implementations.

FIG. 2B is a block diagrams illustrating a data structure used formanaging session-based feature vectors in accordance with someimplementations.

FIG. 2C is a block diagram illustrating a data structure used formanaging a demographic dataset in accordance with some implementations.

FIG. 2D is a block diagram illustrating a data structure used formanaging a map between web sessions and household members in accordancewith some implementations.

FIG. 3 is a block diagram illustrating the components of a computerserver for processing raw web events from a household into web sessionsand assigning each web session to a particular member of the householdin accordance with some implementations.

FIG. 4A is a flow chart illustrating how to classify web sessions andassociate each web session with a member of a group of users inaccordance with some implementations.

FIG. 4B is a flow chart illustrating how to define a demographic profilefor visitors of a website in accordance with some implementations.

FIG. 5A is a block diagram illustrating a process of sessionizing webevents into multiple web sessions and associating each web session witha particular household member in accordance with some implementations.

FIG. 5B is a block diagram illustrating a process of using binaryclassifiers to partition web sessions such that each web session may beuniquely associated with a particular household member in accordancewith some implementations.

DETAILED DESCRIPTION

From the perspective of a content or service provider on the Internet,it is desirable to know demographic information about consumers of itscontent or services. For example, a cosmetic product provider would findit very valuable to know what type of customers (e.g., male or female,young or adult) are the most frequent visitors to its website. Based onsuch information, the cosmetic product provider can adjust the productslisted on its website so that the frequent visitors are able to findmore relevant products. Some implementations determine the demographicinformation of visitors to a website by engaging a group of Internetusers (e.g., members of a household) who have, expressly or implicitly,agreed to have their web browsing activities logged, and then analyzingthe activity logs to determine which user has visited particularwebsites and when. This approach has its own limitations as a householdmember may access the Internet from any number of devices, such assmartphone and tablet computer, at different times of any time of a dayunless the household member agrees to report his or her web browsingactivities by, e.g., logging into a particular account whenever he orshe starts using the Internet and then logging out of the accountwhenever the member stops. Without this cumbersome approach, it would bedifficult to tell with high confidence which household member isresponsible for a particular mouse click that triggers a visit to aparticular website, even if the household has as few as two members.

On the other hand, different members of a household may choose to visitdifferent websites. Website preferences demonstrated by differenthousehold members are often related to their age, gender, and occupationdifferences as well as differences in educational level. For example, itis more likely for an adult member, rather than a teenage member of thesame household, to visit a news or financial planning website.Similarly, a teenage household member is more likely to spend time on asocial network website than the adult member. Although it cannot beruled out that a visit to a particular financial planning website wasinitiated by a teenager in a household, a series of web browsingactivities by one member of a household during a short time period oftenprovides enough information to serve as a fingerprint of that householdmember's characteristic web activity. This observation is especiallyapplicable to a small group of users such as a household having four orfive members with distinct demographic attributes. Once a particularhousehold member is identified as being responsible for a series of webbrowsing activities at one or more websites, it is possible to define ordraw a demographic profile for visitors of a particular website.

FIG. 1A is a block diagram illustrating a client-server computer system100 including a plurality of client devices 112 at different households110 that communicate over a communication network 120 (e.g., Internet)with web servers 130 and a server system 140 that is responsible forassociating a web event from a household with a particular member of thehousehold in accordance with some implementations. As shown in FIG. 1A,a household 110 may include multiple client devices 112 such as one ormore PCs, smartphones, tablets, game consoles, or Internet-enabled TVsor set top boxes. A household member uses a client device 112 to submitrequests to the web servers 130 through, e.g., a web browser application111 and a communication interface 115 of the client device. In response,the web servers 130 return the information or services requested by thehousehold members for display to the household member on the clientdevice 112. In some implementations, a monitor module 113 (which istypically a software application) is installed in the client device,e.g., as a plug-in of the web browser 111 to keep track of the webbrowsing activities at the client device. In some other implementations,a monitor module 119 is installed in a router 117 within the householdto achieve the same purpose by logging the web browsing activitiesthrough the router 117.

In the present application, no monitor is installed in any client deviceor router within a household until after the household has expressly orimplicitly agreed that its members' web browsing activities can belogged and analyzed by a third party (e.g., the server system 140). Forexample, a household may authorize such activity as part of a contractit signs with a service provider that manages the server system 140. Thecontract may also require that the household provide demographicinformation of its household members, e.g., age, gender, occupation,educational level, etc. As will be described blow, because the webbrowsing activities at different households are aggregated at the serversystem 140 and shared with others in an anonymous way, it is verydifficult, if not impossible, for anyone to uncover a particularhousehold member's specific web browsing activities from the aggregateddata.

The server system 140 includes a communication interface 141 forexchanging information with other entities on the Internet. For example,the web browsing activities from different households arrive at thecommunication interface 141 and are then stored in the web log database143, e.g., in the form of web event records. The server system 140 mayalso provide demographic information about visitors of a particularwebsite or web browsing information of a particular demographic group toa requesting client device through the communication interface 141.

As noted above, an individual web event (e.g., a visit to a newswebsite) associated with a household member may not be sufficient forrepresenting the household member's unique “taste” for information orservices. One of the first steps taken by the server system 140 (or morespecifically, the sessionizer 145) after receiving a plurality of weblogs from a household is to partition the web logs into differentsessions. FIG. 5A is a block diagram illustrating a process ofsessionizing web events into multiple web sessions and associating eachweb session with a particular household member in accordance with someimplementations. As shown in the figure, there are a plurality of webevents (501-1, 501-2, . . . , 501-N) associated with the household 501.These web events may be generated by the same client device or differentclient devices in the same household and are ordered by their respectivetimestamps indicating when they were generated or received by the serversystem 140. There are many known methods of sessionizing web events. Forexample, different client devices have different IP addresses or MACaddresses. By checking these parameters, web events coming fromdifferent client devices are separated from each other. For example, theweb events 501-1 and 501-3 may be generated by a PC in the household andthe web events 501-2 and 501-4 may be generated by a tablet in the samehousehold.

But web events from the same client device are not necessarilyassociated with the same household member if the client device is beingshared by different household members. Therefore, different web eventsfrom the client device may be divided into multiple web sessions if theyare generated by different household members. Different rules may beused to partition the web events from the same client device. Forexample, it is assumed that a time difference between two consecutiveweb events within a web session should be less than ten minutes. Inother words, if there is a significant time gap between two consecutiveweb events, in some implementations the sessionizer 145 creates a newsession for the latter of the two consecutive web events. Similarly, itis assumed that a time difference between the first web event and thelast web event of one web session should be less than 30 minutes. Thisrule is used to prevent a web session from having too many web events.As noted above, the present application assumes that there is ademographic-based difference inherent in web browsing activities suchthat different people may choose to visit different types of websites,which are similar to different frequency bands of a color spectrum. Along web session that includes web events associated with many diversewebsites, which is similar to a broad range of the color spectrum, wouldbe difficult to be uniquely associated with any particular householdmember. Conversely, it is almost harmless if the sessionizer 145partitions a series of web events associated with one household memberinto two or more consecutive web sessions as long as each individual websession has enough information to be uniquely associated with aparticular household member because, as described below, the multipleweb sessions would be associated with the same household member by theclassifiers employed by the server system 140. In some implementations,it is assumed that a web session should include at least 10 web eventsbut no more than web events that span over a time period of more than 30minutes

As shown in FIG. 1A, the output of the sessionizer 145 is a plurality ofsession-based web logs 147 stored in a database. FIG. 2A is a blockdiagram illustrating a data structure used for managing session-basedweb logs in accordance with some implementations. The session-based weblogs 147 are broken into different households (200A, 200B). Inparticular, the household 200A includes multiple sessions (210A, 210B)and each session 210A further includes attributes such as a session ID220A, a client device ID 220B (e.g., the client device's MAC address),the operating system 220C used by the client device, the browser name220D, the browser version 220E, and geographical information such ascountry 220F and city 220G as well as a plurality of web events 230A to230N. Each web event 230A further includes a type of event 240A (e.g., auser click of a URL in a web page or a user input of the URL in theaddress field of the web browser), a URL 240B visited by the user, and atimestamp 240 associated with the user visit. Note that the sessionattributes described above are for illustrative purpose and some of thesession attributes may be optional whereas some other attributes may beadded to the data structure.

The feature extractor 149 receives the session-based web logs 147,generates a feature vector for each web session, and stores the featurevectors in the session-based feature vector database 151. In someimplementations, a feature is an abstraction of the content or serviceprovided by a website such as fashion, news, IT, etc. In some otherimplementations, a large website may be qualified as a feature byitself. Because a web session includes one or more web events, each webevent including at least one website, features for characterizing theweb session are indeed built on top of features for characterizingindividual websites associated with the web session. Since differentwebsites provide different types of contents or services, a unifiedmodel is important for representing the content and/or service offeredby different websites and comparing them side by side to determine thedifferences between websites. Over the years, people have developedvarious models for categorizing the information on the Internet, one ofwhich is the “Open Directory Project” at www.dmoz.org. It should benoted that the present application works with many known websiteclassification models for building a feature vector for a web sessionthat includes one or more web events.

In some implementations, the feature extractor 149 typically isconfigured to perform the following operations: (i) extract all thepossible features from a web event associated with a web session basedon a website categorization model; (ii) select among the extractedfeatures, those features that are more likely to be tied with aparticular demographic class of users; (iii) aggregate the featuresselected from different events into a set of session-level features; and(iv) delete some of the session-level features that are of little helpto identify a household member. When aggregating the event-levelfeatures into the session-level features, there are two possibleapproaches: (i) a binary approach or (ii) a cumulative approach. Thebinary approach adds an event-level feature to the set of session-levelfeatures only if the event-level feature has no counterpart in the setof session-level features. In other words, the binary approach does notaccumulate the same event-level feature associated with multiplewebsites within one web session. In contrast, the cumulative approachcounts the number of times of a particular event-level feature if it isassociated with multiple websites within one web session and associatesthe number of times with a corresponding session-level feature as itsweight, which is used to help determine which session-level feature(s)should be deleted. In some implementations, the set of session-levelfeatures resulting from the cumulative approach is binary-encoded byeliminating the weights associated with different features. FIG. 2B is ablock diagrams illustrating a data structure used for managingsession-based feature vectors 151 in accordance with someimplementations. The data structure includes multiple feature vectors(250-1, 250-2), one for each web session. Each feature vector includes asession ID 251 and a set of features (253-A, 253-B), each featureincluding a feature ID 255 and a feature value 257.

In some implementations, a feature vector is a multi-component vectordefined in a multi-dimensional feature space and the feature vectortypically has one or more non-zero values, each one corresponding to afeature associated with the web session. In some implementations, themagnitude of a non-zero value is defined as a weight of thecorresponding feature in the feature space. For example, assuming that aweb session includes a first web event associated with www.cnn.com, asecond web event associated with www.united.com, and a third web eventassociated with www.stanford.edu, a resulting feature vector may have atleast a non-zero component corresponding to news or media, a non-zerocomponent corresponding to transportation, and a non-zero componentcorresponding to education.

FIG. 1B is a block diagram illustrating the components of the featureextractor 159 in the server system in accordance with someimplementations. The feature extractor 149 includes a feature extractorAPI 161 for receiving web session data from the session-based web logdatabase 147. For illustrative purpose, three particular types offeature extractors are depicted in FIG. 1B. The verticals featureextractor 163 extracts verticals (note that “vertical” is a well-knownterm in the art of web advertisement) from different websites associatedwith a particular web session. The website demographic feature extractor165 focuses on those features that are more likely to be associated witha group of users having specific demographic characteristics. Forexample, the set of features (“Top Gear”, “Autotrader”, “Slashshot”) andthe set of features (“OMG”, “Boutiques.com”, “Elle”) most likely pointto two gender-based demographic categories, male and female. The otherfeature extractor 167 is responsible for extracting additional featuresthat are deemed to be useful for identifying a particular householdmember for the web session. Finally, a feature aggregator 169 combinesthe features extracted by different extractors into a set ofsession-level feature vectors as described above and stores them in thesession-based feature vector database 151.

One or more classifiers 154 then process the feature vectors and try toidentify a unique household member for each feature vector (andtherefore for each web session represented by the feature vector). Insome implementations, a classifier 154 needs to be trained before it canbe used to associate a feature vector with the correct household member.FIG. 1A depicts a mechanism for training the classifiers 154. First, theserver system 140 receives a plurality of training web logs 155 andstores them in a database in the server system 140. It is assumed thatthere is a known relationship between a web event in the training weblogs 155 and a respective user who generates the web event. For example,at least some of the web logs are associated with a household that hasonly one member whose demographic information is known. It is furtherassumed that the demographic information such as age, gender,occupation, and educational level associated with the user is alsoknown, which is stored in the demographic model 156. If the training weblogs 155 have not been sessionized, they will be fed into thesessionizer 145 that converts them into a plurality of training websessions. If the training web logs 155 have already been sessionized,they will be provided to the feature extractor 149 directly so that thefeature extractor 149 generates a feature vector for each training websession.

The feature vectors derived from the training web sessions and thedemographic model data 156 are then used to train the classifiers 154.In some implementations, the classifiers 154 include one or more binaryclassifiers such as an age-based classifier (e.g., age above 30 and ageunder 30), a gender-based classifier (e.g., male and female), and aneducational level classifier (college and non-college). Since mosthouseholds have no more than five members, an application of two or morebinary classifiers to the web sessions from a particular household oftenproduces a very accurate mapping relationship between the web sessionsand the corresponding household members. FIG. 5B is a block diagramillustrating a process of using two binary classifiers to partition websessions such that each web session may be uniquely associated with aparticular household member in accordance with some implementations. Inthis example, the household has four members:

-   -   Mary—a mother at the age of 45;    -   John—a father at the age of 55;    -   Ann—a daughter at the age of 18; and    -   James—a son at the age of 16.

Assuming that a plurality of web sessions have been generated for thehousehold, each web session having a feature vector, an application ofthe gender-based classifier 520 splits the web sessions into two groups,one group of web sessions associated with female household members(including Mary and Ann) and the other group of web sessions associatedwith male household members (including John and James). The age-basedclassifier 525 then splits each group of web sessions into twosub-groups. As a result, each of the web sessions is associated with aunique household member as shown in FIG. 5B. In some implementations,the classification result is independent from the order of applyingdifferent binary classifiers. For example, the same result shown in FIG.5B can be achieved by first applying the age-based classifier 525 andthen applying the gender-based classifier 520. In some otherimplementations, the order of applying different classifiers may affectthe final result. For example, the classification process begins withmost polarized classifiers (e.g., gender-based classifier) such that thetwo groups of web sessions generated by this classifier are sodramatically different from each other and then applies the lesspolarized classifiers (e.g., educational-level classifier) in order toachieve the most accurate result.

After being trained by the training web logs 155 and the demographicmodel 156, the classifiers 154 can be applied to feature vectorsassociated with a plurality of target web sessions (i.e., web sessionsfrom an unknown member of a particular household) and the associateddemographic dataset 157. FIG. 2C is a block diagram illustrating a datastructure used for managing the demographic dataset 157 in accordancewith some implementations. For each household (260-1 or 260-2), there isa household ID 262, a household address 264, household income level 268and personal information of each household member (269-A or 269-B)including member ID 271, age, 272, gender 273, education level 274,occupation 275, etc. As noted above, the household has provided suchdemographic data when it enters into a contract with a third-partyagency to allow the agency to log all the web browsing activities fromthe household. In some implementations, the demographic model 156 ispart of the demographic dataset 157.

An application of the classifiers 154 to the session-based featurevectors 151 and the associated demographic dataset generates asession-user map 159. FIG. 2D is a block diagram illustrating a datastructure used for managing the session-user map between web sessionsand members of a particular household in accordance with someimplementations. For each web session (280-1 or 280-2), there are asession ID 281 and a set of demographic features associated with the websession including age 282, gender 283, education 284, and occupation285, which are determined by the classifiers 154, and information abouta particular household member who is identified as being associated withthe web session including a household member ID 286 and a confidencelevel 287. The confidence level is a parameter that indicates thestatistical accuracy of the classifiers 154. In some implementations,each binary classifier provides its own confidence level for aparticular household and the confidence level 287 is a function of theconfidence levels associated with different classifiers (e.g., thelowest one).

As noted above, the session-user map 159 has multiple uses. For example,an advertisement agency can query the session-user map to determine whattype of web sessions (e.g., in the form of features vectors, verticalsor specific websites) is popular among a particular demographic group ofusers and then provide target-oriented advertisements on the websitesassociated with the web sessions. Alternatively, the session-user map159 can be used by the website profiler 161 to get a profile forvisitors to a particular website. As noted above, each web sessionincludes one or more web events, each web event being associated with aparticular website. In order to improve the accuracy of theclassification result, web events are grouped into different websessions. But once it is determined that a web session is associatedwith a particular household member, all the web events are certainlyattributed to the same household member. Using the session-user map, itis possible to identify all the visitors from many households to aparticular website during a specific time period and their associateddemographic information. For example, the website profiler 161 can querythe session-user map and determine the demographic composition ofvisitors to a particular website. Based on such information, anadvertisement agency can choose to display advertisements that are morerelevant to visitors of the website.

FIG. 3 is a block diagram illustrating the components of a computerserver system 140 for processing raw web events from a household intoweb sessions and assigning each web session to a particular member ofthe household in accordance with some implementations. The computerserver system 140 includes one or more processing units (CPU's) 302 forexecuting modules, programs and/or instructions stored in memory 312 andthereby performing processing operations; one or more network or othercommunications interfaces 310; memory 312; and one or more communicationbuses 314 for interconnecting these components. In some implementations,the computer server system 140 includes a user interface 304 comprisinga display device 308 and one or more input devices 306 (e.g., keyboardor mouse). In some implementations, the memory 312 includes high-speedrandom access memory, such as DRAM, SRAM, DDR RAM or other random accesssolid state memory devices. In some implementations, memory 312 includesnon-volatile memory, such as one or more magnetic disk storage devices,optical disk storage devices, flash memory devices, or othernon-volatile solid state storage devices. In some implementations,memory 312 includes one or more storage devices remotely located fromthe CPU(s) 302. Memory 312, or alternately the non-volatile memorydevice(s) within memory 312, comprises a non-transitory computerreadable storage medium. In some implementations, memory 312 or thecomputer readable storage medium of memory 312 stores the followingelements, or a subset of these elements, and may also include additionalelements:

-   -   an operating system 316 that includes procedures for handling        various basic system services and for performing hardware        dependent tasks;    -   a network communications module 318 that is used for connecting        the computer server system 140 to other computers via the        communication network interfaces 310 and one or more        communication networks (wired or wireless), such as the        Internet, other wide area networks, local area networks,        metropolitan area networks, and so on;    -   a sessionizer module 145 for grouping web events from a        household into different web sessions as described above in        connection with FIG. 5A;    -   a feature extractor module 147 for extracting a set of features        from a web session based on a website classification model, the        feature extractor module 147 further including a verticals        feature extractor 163, a website demographic feature extractor        165, other feature extractors 167, and a feature aggregator 169,        which are described above in connection with FIG. 1B;    -   one or more classifier modules 154 for partitioning web sessions        associated with a household into two or more groups based on        their respective feature vectors, the classifier modules 154        further including an age-based classifier module 154-1, a        gender-based classifier module 154-3, and others such as        occupation-based classifier module and education-level based        classifier module;    -   a website profiler module 161 for determining the demographic        information of visitors to a particular website based on the        session-user map generated by the classifiers 154 as described        above in connection with FIG. 1A;    -   a plurality of web logs 143, which correspond to the raw web        browsing data from a particular household, the web logs 143        further including a plurality of training web logs 155 that are        used for training the classifiers 154;    -   a plurality of session-based web logs 147 that include a        plurality of web sessions derived from the web logs 143;    -   a plurality of session-based feature vectors 151, each feature        vector including a set of features derived from a particular web        session;    -   a demographic dataset 157 including the demographic data for a        plurality of households that have agreed that their web browsing        activities be logged and analyzed by the computer server system        140, the demographic dataset 157 further including a demographic        model 156 that is used together with the training web logs 155        for training the classifiers 154 and the demographic model 156        further including a map between a respective feature vector        156-1 associated with a web session and a set of demographic        attributes 156-2 of a user that generates the web session; and    -   a session-user map 159 that maps a respective web session from        the session-based web logs 147 to a particular household member        whose demographic attributes are stored in the demographic        dataset 157.

In sum, different components within the computer server system 140 workin convert to associate a respective web browsing activity with a memberof a particular household.

FIG. 4A is a flow chart illustrating how the computer server system 140classifies web sessions and associate each web session with a member ofa group of users in accordance with some implementations. Note that thegroup of users may be members of a household that has agreed to reportits web events to a web survey entity. Initially, the computer serversystem 140 receives (401) a plurality of training web sessions, eachtraining web session including one or more web events generated by arespective known user having one or more demographic attributes. Thedemographic attributes may include at least one of gender, age,education level, and occupation. In some implementations, at least oneof the training web sessions is generated by a household that has onlyone household member.

The computer server system 140 trains (403) one or more binaryclassifiers using the plurality of training web sessions and thedemographic attributes of the associated users. For example, thecomputer server system 140 extracts a set of features from the webevents of a training web session. Since the web events are generated bya known user (e.g., the only member of a household) whose demographicattributes are known to the computer server system 140 (e.g., beingstored in the demographic model 156), the computer server system 140defines a map between the set of features and one or more of thedemographic attributes associated with the user. For each training websession, the computer server system 140 repeats the extracting anddefining operations and aggregates the maps associated with respectivetraining web sessions into a demographic model 156.

In some implementations, for each web event, the computer server system140 identifies one or more event-level characteristics (like thosedescribed above in connection with FIG. 1A) based on a web contentcategorization model and then aggregates the event-level characteristicsassociated with the respective web events into a set of featuresassociated with the web session with each feature having an associatedweight. In some implementations, the computer server system 140 ordersthe set of features based on their respective weights, selects a subsetof the features whose weights are higher than a predefined thresholdlevel and then converts the subset of features into a binary vector of amulti-dimensional space defined by the web content categorization model.

After training one or more binary classifiers, the computer serversystem 140 receives (405) a plurality of target web sessions to beassociated with a respective member of the group of users. In someimplementations, each target web session includes one or more web eventsthat are generated by a respective unknown member of the group of usersalthough the demographic attributes associated with each user are knownand stored in the demographic dataset. The computer server system 140applies one or more of the trained binary classifiers to the pluralityof target web sessions such that a respective target web session isuniquely associated with a member of the group of users based on, atleast in part, the demographic attributes of the member of the group ofusers. After this step, it is possible for the computer server system140 to tell which household member is responsible for a particular visitto a specific website at a particular moment and what activities thatthe household member has conducted while visiting the website.

As noted above, the session-user map generated by the classifiers may beused for various purposes, one of which is depicted in FIG. 4B, i.e.,how to define a demographic profile for visitors of a website inaccordance with some implementations. As described above in connectionwith FIG. 4A, the computer server system 140 first receives (411) aplurality of web sessions. In some implementations, each web sessionincludes at least one web event associated with a particular website,which is generated by a respective unknown member of a group of users,and each user (e.g., a member of a particular household) has one or moredemographic attributes known to the computer server system 140. Thecomputer server system 140 then applies (413) one or more demographicbinary classifiers to the plurality of web sessions such that each websession is uniquely associated with a member of the group of users basedon, at least in part, the demographic attributes of the member of thegroup of users. Finally, the computer server system 140 defines (415) ademographic profile for visitors of the particular website based on anaggregation of the demographic attributes associated with respectivemembers of the group of users.

In some implementations, the demographic profile includes at least oneof (i) a gender ratio between male visitors of the website and femalevisitors of the website, (ii) an age ratio between visitors of thewebsite whose ages are above a threshold level and visitors of thewebsite whose ages are below the threshold level, and (iii) aneducational level ratio between visitors of the website whoseeducational levels are above a threshold level and visitors of thewebsite whose educational levels are below the threshold level.

Although some of the various drawings illustrate a number of logicalstages in a particular order, stages that are not order dependent may bereordered and other stages may be combined or broken out. While somereordering or other groupings are specifically mentioned, others will beobvious to those of ordinary skill in the art and so do not present anexhaustive list of alternatives. Moreover, it should be recognized thatthe stages could be implemented in hardware, firmware, software or anycombination thereof.

The foregoing description, for purpose of explanation, has beendescribed with reference to specific implementations. However, theillustrative discussions above are not intended to be exhaustive or tolimit the invention to the precise forms disclosed. Many modificationsand variations are possible in view of the above teachings. Theimplementations were chosen and described in order to best explainprinciples of the invention and its practical applications, to therebyenable others skilled in the art to best utilize the invention andvarious implementations with various modifications as are suited to theparticular use contemplated. Implementations include alternatives,modifications and equivalents that are within the spirit and scope ofthe appended claims. Numerous specific details are set forth in order toprovide a thorough understanding of the subject matter presented herein.But it will be apparent to one of ordinary skill in the art that thesubject matter may be practiced without these specific details. In otherinstances, well-known methods, procedures, components, and circuits havenot been described in detail so as not to unnecessarily obscure aspectsof the implementations.

What is claimed is:
 1. A computer-implemented method, comprising: at acomputer system having memory and one or more processors: receiving aplurality of training web sessions, each training web session includingone or more web events generated by a respective known user having oneor more demographic attributes; training one or more binary classifiersusing the plurality of training web sessions and the demographicattributes of the associated users, wherein the training furthercomprises: extracting a set of features from the web events of atraining web session, wherein the web events are generated by a knownuser who has one or more demographic attributes; for each web event,identifying one or more event-level characteristics based on a webcontent categorization model; aggregating the event-levelcharacteristics associated with the respective web events into a set offeatures associated with the web session, wherein each feature has anassociated weight that is based on aggregated number of web eventshaving the feature; defining a map between the set of features and oneor more of the demographic attributes associated with the user; andrepeating said extracting and defining operations for one or more of theplurality of training web sessions to aggregate the maps associated withrespective training web sessions into a demographic model; receiving aplurality of target web sessions, each target web session including oneor more web events that are generated by a respective unknown member ofa group of users, wherein each user has one or more demographicattributes; and applying one or more of the binary classifiers to theplurality of target web sessions such that a respective target websession is uniquely associated with a member of the group of users basedon, at least in part, the demographic attributes of the member of thegroup of users.
 2. The computer-implemented method of claim 1, whereinthe group of users are members of a known household that has agreed toreport its web events to a web survey entity.
 3. Thecomputer-implemented method of claim 1, wherein the one or moredemographic attributes include at least one of gender, age, educationlevel, and occupation.
 4. The computer-implemented method of claim 1,wherein at least one of the training web sessions is generated by ahousehold that has only one household member.
 5. Thecomputer-implemented method of claim 1, wherein the one or more binaryclassifiers include at least one of gender classifier, age classifier,education level classifier, and occupation classifier.
 6. Thecomputer-implemented method of claim 1, further comprising: ordering theset of features based on their respective weights; selecting a subset ofthe features whose weights are higher than a predefined threshold level;and converting the subset of features into a binary vector of amulti-dimensional space defined by the web content categorization model.7. A computer-implemented method, comprising: at a computer systemhaving memory and one or more processors: receiving a plurality of websessions, wherein each web session includes at least one web eventassociated with a first website, and wherein each web session isgenerated by a respective unknown member of a group of users, each userhaving one or more known demographic attributes; applying one or moredemographic binary classifiers to the plurality of web sessions suchthat each web session is uniquely associated with a respective member ofthe group of users based on, at least in part, the demographicattributes of the respective member of the group of users; and defininga website demographic profile for the first website based on anaggregation of the demographic attributes associated with respectivemembers associated with the plurality of web sessions.
 8. Thecomputer-implemented method of claim 7, wherein the demographic profileincludes a gender ratio between male visitors of the website and femalevisitors of the website.
 9. The computer-implemented method of claim 7,wherein the demographic profile includes an age ratio between visitorsof the website whose ages are above a threshold level and visitors ofthe website whose ages are below the threshold level.
 10. Thecomputer-implemented method of claim 7, wherein the demographic profileincludes an educational level ratio between visitors of the websitewhose educational levels are above a threshold level and visitors of thewebsite whose educational levels are below the threshold level.
 11. Acomputer system, comprising: one or more processors; memory; and one ormore program modules stored in the memory and configured for executionby the one or more processors, the one or more program modulescomprising instructions for: receiving a plurality of training websessions, each training web session including one or more web eventsgenerated by a respective known user having one or more demographicattributes; training one or more binary classifiers using the pluralityof training web sessions and the demographic attributes of theassociated users, wherein the training further comprises: extracting aset of features from the web events of a training web session, whereinthe web events are generated by a known user who has one or moredemographic attributes; for each web event, identifying one or moreevent-level characteristics based on a web content categorization model;aggregating the event-level characteristics associated with therespective web events into a set of features associated with the websession, wherein each feature has an associated weight that is based onaggregated number of web events having the feature; defining a mapbetween the set of features and one or more of the demographicattributes associated with the user; and repeating said extracting anddefining operations for one or more of the plurality of training websessions to aggregate the maps associated with respective training websessions into a demographic model; receiving a plurality of target websessions, each target web session including one or more web events thatare generated by a respective unknown member of a group of users,wherein each user has one or more demographic attributes; and applyingone or more of the binary classifiers to the plurality of target websessions such that a respective target web session is uniquelyassociated with a member of the group of users based on, at least inpart, the demographic attributes of the member of the group of users.12. The computer system of claim 11, wherein the group of users aremembers of a known household that has agreed to report its web events toa web survey entity.
 13. The computer system of claim 11, wherein atleast one of the training web sessions is generated by a household thathas only one household member.
 14. The computer system of claim 11,wherein the one or more binary classifiers include at least one ofgender classifier, age classifier, education level classifier, andoccupation classifier.
 15. A non-transitory computer readable storagemedium storing one or more programs configured for execution by acomputer system that includes one or more processors and memory, the oneor more programs comprising instructions for: receiving a plurality oftraining web sessions, each training web session including one or moreweb events generated by a respective known user having one or moredemographic attributes; training one or more binary classifiers usingthe plurality of training web sessions and the demographic attributes ofthe associated users , wherein the training further comprises:extracting a set of features from the web events of a training websession, wherein the web events are generated by a known user who hasone or more demographic attributes; for each web event, identifying oneor more event-level characteristics based on a web contentcategorization model; aggregating the event-level characteristicsassociated with the respective web events into a set of featuresassociated with the web session, wherein each feature has an associatedweight that is based on aggregated number of web events having thefeature; defining a map between the set of features and one or more ofthe demographic attributes associated with the user; and repeating saidextracting and defining operations for one or more of the plurality oftraining web sessions to aggregate the maps associated with respectivetraining web sessions into a demographic model; receiving a plurality oftarget web sessions, each target web session including one or more webevents that are generated by a respective unknown member of a group ofusers, wherein each user has one or more demographic attributes; andapplying one or more of the binary classifiers to the plurality oftarget web sessions such that a respective target web session isuniquely associated with a member of the group of users based on, atleast in part, the demographic attributes of the member of the group ofusers.
 16. The non-transitory computer readable storage medium of claim15, wherein the group of users are members of a known household that hasagreed to report its web events to a web survey entity.
 17. Thenon-transitory computer readable storage medium of claim 15, wherein atleast one of the training web sessions is generated by a household thathas only one household member.
 18. The non-transitory computer readablestorage medium of claim 15, wherein the one or more binary classifiersinclude at least one of gender classifier, age classifier, educationlevel classifier, and occupation classifier.